High speed switching architecture

ABSTRACT

The present invention provides a high speed non-blocking buffered banyan packet switching architecture which utilizes parallel switching fabrics to switch slices of serial packets (subpackets) in a parallel manner. Serial digital information is received, converted into parallel form, buffered, and introduced into a parallel interconnect network which provides separate parallel paths for each packet/subpacket of information. The parallel subpackets are multiplexed, switched, demultiplexed, and recombined by way of a parallel-to-serial converter and an output port controller so as to reconstitute the original serial data stream, thereby providing high speed effective data switching at relatively low clock speeds.

TECHNICAL FIELD

The present invention relates to a high speed switching architecture,particularly for ATM or fast packet switches.

BACKGROUND ART

The banyan-based architectures are one type of space division packetswitching. However, while the banyan-based switches have lesscrosspoints than other techniques, they do require a means of overcomingblocking, improving throughput and reducing cell loss. This is becauseof the contention that occurs at a crosspoint when two (or more) inputswant to access the same outlet. These `means` therefore further classifythe banyan-based switches into either buffered-banyan or batcher-banyanarchitectures. The buffered banyan architectures have buffers at thepoints of contention while the batcher-banyan architectures minimise thecontention by sorting the input cells. The buffered banyan architecturehas been adopted to realise a switching fabric subsystem. However, thesegenerally involved several levels of buffers at the input, output andintermediate switching stages.

DISCLOSURE OF INVENTION

It is an object of the present invention to provide an architecturewhich improves the throughput of and minimises the delay through theswitching fabric by a packet or ATM cell.

This is achieved by providing a manyfold parallel path internal switcharchitecture, which requires minimal buffering and multiplexing.According to one aspect the present invention comprises a packet switch,comprising a switching fabric unit (SFU) having a plurality of inputsand a plurality of outputs, each input and output having a respectiveport controller means, wherein said input port controller means areadapted to convert each input serial packet into a plurality of parallelpackets, said SFU including internal parallel paths for each of saidplurality of packets, and said output port controller means includingmeans for converting said parallel packets into a serial packet form asinput.

The invention will be described with reference to a 16×16 switchingarchitecture, i.e., an architecture for switching input ATM cells from16 inputs and switching to any one of 16 outputs. However, it will beappreciated that the inventive concept is equally applicable to othern×n switches.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a schematic view of one embodiment of a switch according tothe present invention;

FIG. 2 is a conceptual view of a format of a packet;

FIG. 3 is an illustration of a switching fabric unit architectureaccording to one embodiment of the invention;

FIG. 4 illustrates schematically in part one switching fabricarchitecture;

FIG. 5 illustrates schematically a preferred switching fabricarchitecture; and

FIG. 6 illustrates the multi-plane switch architecture.

FIG. 7 illustrates the FIFO buffer architecture.

DETAILED DESCRIPTION

Referring to FIG. 1, a schematic block diagram conceptually illustratesa switch 10 comprising a switching fabric with inputs 0-15 and outputs0-15, i.e. a 16×16 switching fabric. The switch also includes input portcontrollers 30_(n) on each input 0-15 and output port controllers 20_(n)on each output 0-15. In a suitable construction input and output portcontrollers may be the same unit.

Packets to be switched preferably arrive at the input port controller inthe form of ATM frames. Referring to FIG. 2, an ATM frame according toan embodiment of the invention comprises a header of at least 3 bytes,and an ATM cell as defined by CCITT recommendation I.361 comprising 53bytes as payload.

Input port controllers 30_(n) convert incoming serial ATM frames into an8 bit wide data stream. The serial ATM frames are converted to parallelpackets by sequentially placing received bits onto each parallel link.The output port controllers 20_(n) perform the reverse operation.

It will therefore be appreciated that links 21_(n), 31_(n) between theSFU and output and input port controllers are in fact each 8-foldparallel connections.

The switching fabric 10 according to the present invention comprisesfour parallel planes, each plane being a 16×16 switching fabric sub-unit16, 17, 18, 19, as can be seen in concept from FIG. 6. Thus, two bitwide slices of the 8-bit wide data stream are received by each 16×16switching fabric sub-unit 16, 17, 18, 19, wherein the replicated addressheader for each respective plane is identical.

A general architecture for a 16×16 switching fabric sub-unit constructedfrom 4×4 elements 11 and 12, is shown in FIG. 3. Other architectures atthis level may be used within the scope of the invention, but thisarchitecture will be used by way of example.

FIG. 4 shows one embodiment of the invention in detail. This correspondsto two 4×4 elements 11 and 12 of a single plane of the switching fabricwith interconnect as indicated.

Two bit wide inputs 39 are converted to an 8-bit wide data stream byserial to parallel converters 40, and enter 8-bit wide FIFO buffer 42.Interconnect network 43 provides separate parallel paths for each framesegment from buffer 42 to the addressed multiplexer 44.

Multiplexer 44 is effectively controlled by the routing headers on thecells, and routes the inputs via link 45 to input FIFO buffers 46 of thesecond switching stage. Again, interconnect network 47 provides separateparallel paths for each packet to the addressed one of multiplexers 48.Parallel connection 49 connects to parallel to serial converters 50,which produce each a 2 bit wide output from 8 bit wide input 49, andhence output 51 comprises a 2 bit wide data for output to the respectiveoutput port controller.

It will be appreciated that the various slices from all the planes willbe recombined at the output port controller to reconstitute the originalserial data stream.

A preferred embodiment is shown in FIG. 5, showing two 4×4 elements of asingle plane.

The input 60 from input port controller 30₀, is a 2 bit wide slicepresented to FIFO buffers 61. Once the packets reach the output ofbuffers 61, they are sent via parallel interconnect network 62 andgating means 69 to FIFO buffers 63 of the second stage. It should beappreciated that gating means 69 are effectively controlled by therouting headers on the cells, and if required gating means may involvemore elaborate multiplexing. Once the packets are clocked to the end ofbuffers 63 and multiplexers 65 are available, the packets are sent viaparallel interconnect network 64 to multiplexers 65 and then via output66 to the respective output port controller.

It will be appreciated that in this embodiment, no further serial toparallel conversion is introduced beyond the port controller stage.Instead internal parallel paths alone are used to provide a non-blockingcapability and improved throughput.

It will be appreciated that the configuration of FIG. 5 represents animprovement in throughput as compared with a basic buffered-banyanarchitecture. Using the preferred embodiment of the present inventionthroughput limits at approximately 70% of offered load. Resultspreviously published for a 16×16 single buffered-banyan network with 2×2switching elements show limiting at about 52% (Jenq YC, "PerformanceAnalysis of a Packet Switch based on Single-Buffered-Banyan Network",IEEE Journal of Selected Areas is Communications Vol SAC-1 No. 6December 1983 pp 1014-1021).

Implementation

The following discussion relates to one implementation of the inventionand is not to be taken as limitative of the general scope of theinvention.

This implementation uses 1μ CMOS standard cell technology, principallybecause of availability--custom ASICS would probably result in a moreoptimum arrangement.

The design chosen uses dual port RAM for the FIFO buffers to reduce chiparea and power dissipation.

The implementation uses 4 switching Fabric Chips (SFC) operating inparallel as discussed previously. Each SFC switches 2 bits, i.e. onequarter of the byte which is input to port controller 30_(n). The datais clocked at about 20 MHz between the port controllers 30_(n) and SFU10.

It is important to note that packets from all ports are aligned in time,and that progress through each parallel SFC is aligned, so as that atthe output the fragments of each packet may be reliably reassembled.

To provide control and timing to internal circuits within each SFC, a 20MHz 2-phase clock with 90° phase shift is required. This is particularlyrequired for the dual port RAM selected. Preferably, clock skew acrossthe entire switch is less than 5 nanoseconds.

In order to maximise throughput, the SFC architecture should beoptimised as much as possible. The key parameters for maximisingthroughput have been identified as:

buffer sizes and distribution

increasing internal transfer rate

simultaneous read from and write into buffers

cut-through capability.

The implementation shown in FIG. 5 has a number of advantages,including:

avoids serial to parallel and parallel to serial conversion

allows for read in and write out simultaneously from buffers (therebyreducing buffer size)

simplified control circuitry

simplifies cut-through implementation.

Each inlet 60 has a FIFO buffer 61 with a depth of 1 packet (i.e. 64addressable locations) and a width of 2 bits. Each second stage element,however, has 16 FIFO buffers 63 each 1 packet deep. Hence, in the secondstage there are a total of 64 FIFOs (allowing for the other 4×4elements) and so the internal data transfer rate is effectively 160Mbps. Stage 1 merely requires selection of the correct stage 2 buffer63.

In the implementation chosen, the 80 FIFO (16 stage 1+64 stage 2)buffers are implemented as a dual-port RAM operating as a FIFO withpackets stored in parallel. This allows for a vastly reduced arearequirement on the chip. A schematic illustration is shown as FIG. 7,for four FIFO buffers.

The RAM block is dual port to permit simultaneous read and write as in aFIFO. Since the buffers are combined in a block they have commonaddress, read and write lines. Each of the four FIFO buffers in this RAMblock however, operate as separate buffers. Since packets entering theSFC are synchronised the address can be identical for each buffer. Buteach buffer must have its own control for either reading the last packetand writing a new packet or storing the last packet. This is achievedwith the multiplexers at the RAM write port which selects either dataalready in the RAM or new data.

The differences between stages 1 and 2 arise because of the fourfoldincrease in parallel interconnection paths between them. From FIG. 5, itcan be seen that stage 1 has 4 buffers, associated requester control and16 output paths which are arranged in 4 groups of 4. The second stagetherefore, has 16 buffers arranged in parallel. The 4 outputs haveaccess to each of the 16 buffers under the control of the granterassociated with each output.

The RAM buffer block differ slightly. Stage 1 delays the input beforewriting the packet into the main block of RAM, whereas stage 2 delaysthe data already stored in RAM. This occurs because of the requirementto have the input and output packets from the SFC aligned.

The maximum frequency of operation is to be 25 MHz. Thus, the timeallowed for propagation delay and set-up time etc. is 40 nanoseconds.Under worst case conditions of temperature and process variations afactor of 1.69 is used to calculate the maximum typical delay allowedfor correct device operation. This equates to 23.67 nanoseconds. Thedesign is to follow sunchronous design rules which means that flip flopsare only clocked by the master clock. This simplifies the realisationprocess to basically 3 constraints:

1. the minimisation of clock skew over the chip, and

2. keeping the propagation delay through combinational logic etc.between 2 flip flop or latches to less than 22 nanoseconds. (Allowancemust also be made for set-up times, clock skew etc.)

3. keeping the propagation delay through combinational logic etc.between a flip flop and RAM write port to less than 17 nanoseconds.(Allowance must also be made for set-up times, clock skew etc.).

Further improvements in throughput can be achieved with variations inthe architectures but which still are within the scope of the invention.

It will be appreciated that the preferred embodiment is adapted to alloweasy implementation using integrated circuit techniques. Further, theinventive concept is applicable to other n×n switches, using othersub-elements than 4×4 e.g. 2×2 if desired. Variations and additionswithin the spirit and scope of the invention will be apparent to theskilled addressee and are incorporated within this application.

What is claimed is:
 1. A multistage space division packet switch,comprising a switching fabric unit SFU having a plurality of inputs anda plurality of outputs, each input having an input port controller meansand each output having an output port controller means, wherein saidinput port controller means are adapted to convert each input serialpacket into a plurality of parallel subpackets, said SFU includinginternal parallel switching planes, each of said parallel switchingplanes comprising a first stage including a first buffer means, aparallel interconnect network, and a second stage comprising at leastone buffer means associated with each of a plurality of addressableoutputs, and said output port controller means including means forconverting said parallel subpackets into a serial packet form foroutput.
 2. A packet switch according to claim 1, wherein said packetsare self addressing.
 3. A packet switch according to claim 1, whereinsaid internal planes are synchronized such that said parallel subpacketsarrive in the correct order at said output port controller means.
 4. Apacket switch according to claim 3, wherein said packets are selfaddressing.
 5. A packet switch comprising a plurality of packet switchesaccording to claim
 1. 6. A method of packet switching, in a systemcomprising a switching fabric unit SFU having a plurality of inputs anda plurality of outputs, each input having an input port controller meansand each output having an output port controller means, comprising thesteps of:converting each input serial packet into a plurality ofparallel subpackets at said input port controller means; inputting eachof said plurality of parallel subpackets to one of a set of parallelswitching planes in said SFU; switching each of said plurality ofparallel subpackets over a set of internal parallel paths within saidplane; outputting from said SFU said plurality of parallel subpackets;and reassembling each serial packet from said plurality of parallelsubpackets at said output port controller means.